Mixtures of probability experts for audio retrieval and indexing
نویسنده
چکیده
This paper describes a system for connecting non-speech sounds and words using linked multi-dimensional vector spaces. An approach based on mixture of experts learns the mapping between one space and the other. This paper describes the conversion of audio and semantic data into their respective vector spaces. Two different mixture-of-probability-expert models are trained to learn the association between acoustic queries and the corresponding semantic explanation, and visa versa. Test results are presented based on commercial sound effects CDs. 1. THE APPROACH This paper describes a method of connecting sounds to words, and words to sounds. Given a description of a sound, the system finds the audio signals that best fit the words. Thus, a user might make a request with the description " the sound of a galloping horse, " and the system responds by presenting recordings of a horse running on different surfaces, and possibly of musical pieces that sound like a horse galloping. Conversely, given a sound recording, the system describes the sound or the environment in which the recording was made. Thus, given a recording made outdoors, the system says confidently that the recording was made at a horse farm where several dogs reside. A system that has these functions, called MPESAR (mixtures of probability experts for semantic–audio retrieval), learns the connections between a semantic space and an acoustic space. Semantic space maps words into a high-dimensional probabilis-tic space. Acoustic space describes sounds by a multidimen-sional vector. In general, the connection between these two spaces will be many to many. Horse sounds, for example, might include footsteps and neighs. Figure 1 shows one half of MPESAR: how to retrieve sounds from words. Annotations that describe sounds are clustered and represented with multinomial models. The sound files, or acoustic documents, that correspond to each node in the semantic space are modeled with Gaussian mixture models (GMMs). Given a semantic request, MPESAR identifies the portion of the semantic space that best fits the request, and then measures the likelihood that each sound in the database fits the GMM linked to this portion of the semantic space. The most likely sounds are returned to satisfy the user's semantic request. Figure 2 shows the other half of MPESAR: how to generate words to describe a sound. MPESAR analyzes the collection of sounds and builds models for arbitrary sounds. This approach gives us a multi-dimensional representation of any sound, and a distance …
منابع مشابه
A Comparing between the impacts of text based indexing and folksonomy on ranking of images search via Google search engine
Background and Aim: The purpose of this study was to compare the impact of text based indexing and folksonomy in image retrieval via Google search engine. Methods: This study used experimental method. The sample is 30 images extracted from the book “Gray anatomy”. The research was carried out in 4 stages; in the first stage, images were uploaded to an “Instagram” account so the images are tagge...
متن کاملContent Based Radiographic Images Indexing and Retrieval Using Pattern Orientation Histogram
Introduction: Content Based Image Retrieval (CBIR) is a method of image searching and retrieval in a database. In medical applications, CBIR is a tool used by physicians to compare the previous and current medical images associated with patients pathological conditions. As the volume of pictorial information stored in medical image databases is in progress, efficient image indexing and retri...
متن کاملThe Influence of Word Detection Variability on IR Performance in Automatic Audio Indexing of Course Lectures
This paper presents a study of the influence of acoustic variability on topic spotting performance in an application involving automatic indexing of course lectures. The application involves users formulating keyword queries to an indexing system which includes phone lattice based acoustic representations of audio material, a mechanism for keyword searching of a phone lattice, and a measure for...
متن کاملFast vocabulary-independent audio search using path-based graph indexing
Classical audio retrieval techniques consist in transcribing audio documents using a large vocabulary speech recognition system and indexing the resulting transcripts. However, queries that are not part of the recognizer’s vocabulary or have a large probability of getting misrecognized can significantly impair the performance of the retrieval system. Instead, we propose a fast vocabulary indepe...
متن کاملIndexing Audio Documents by using Latent Semantic Analysis and SOM
This paper describes an important application for state-of-art automatic speech recognition , natural language processing and information retrieval systems. Methods for enhancing the indexing of spoken documents by using latent semantic analysis and self-organizing maps are presented, motivated and tested. The idea is to extract extra information from the structure of the document collection an...
متن کامل